-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary sequence estimator #84
Conversation
Thanks for the contribution! Before we can merge this, we need @mikeloh77 to sign the Salesforce.com Contributor License Agreement. |
val convertI1 = FeatureTypeSparkConverter[I1]() | ||
val convertI2 = FeatureTypeSparkConverter[I2]() | ||
val func = (r: Row) => { | ||
val arr = new ArrayBuffer[I2](r.length-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add spaces r.length - 1
@@ -470,6 +470,46 @@ case object FeatureSparkTypes { | |||
} | |||
} | |||
|
|||
def udf2N[I1 <: FeatureType : TypeTag, I2 <: FeatureType : TypeTag, O <: FeatureType : TypeTag] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs
UserDefinedFunction(func, outputType, inputTypes = None) | ||
} | ||
|
||
def transform2N[I1 <: FeatureType : TypeTag, I2 <: FeatureType : TypeTag, O <: FeatureType: TypeTag] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs here as well
@@ -549,6 +549,43 @@ trait OpPipelineStageN[I <: FeatureType, O <: FeatureType] extends OpPipelineSta | |||
)(tto) | |||
} | |||
|
|||
/** | |||
* Pipeline stage of single Feature of type I1 with multiple Features of type I2 to output 1Feature of type O |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo 1Feature
implicit val tti1: TypeTag[I1], | ||
val tti2: TypeTag[I2], | ||
val tto: TypeTag[O], | ||
val tti1v: TypeTag[I1#Value], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please rename tti1v
-> ttiv1
and tti2v
-> ttiv2
(since we have those in other classes as well) + update the docs
val columns = Array(col(in1.name)) ++ seqColumns | ||
|
||
val df = dataset.select(columns: _*) | ||
val ds = df.map(r => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should be using the converters here instead:
val ds = df.map { r =>
val rowSeq = r.toSeq
(convertI1.fromSpark(rowSeq.head), rowSeq.tail.map(seqIConvert.fromSpark(_).value))
}
import scala.reflect.runtime.universe.TypeTag | ||
import scala.util.Try | ||
|
||
trait OpTransformer2N[I1 <: FeatureType, I2 <: FeatureType, O <: FeatureType] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs
* @return a fitted model that will perform the transformation specified by the function defined in constructor fit | ||
*/ | ||
override def fit(dataset: Dataset[_]): BinarySequenceModel[I1, I2, O] = { | ||
assert(inN.nonEmpty, "Inputs cannot be empty") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps getTransientFeatures.nonEmpty
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a check specifically on the sequence input getTransientFeatures
will be nonempty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then getTransientFeatures.size > 1
, so it will include the check of existence of in1 as well
* @return a new dataset containing a column for the transformed feature | ||
*/ | ||
override def transform(dataset: Dataset[_]): DataFrame = { | ||
assert(inN.nonEmpty, "Inputs cannot be empty") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps getTransientFeatures.nonEmpty
here as well?
Set("-1.0", "three", "four").toMultiPickList, | ||
Set("15.0", "five", "six").toMultiPickList, | ||
Set("1.111", "seven", "").toMultiPickList | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.map(_.toMultiPickList)
Vectors.dense(0.4, 0.5, Double.PositiveInfinity).toOPVector, | ||
Vectors.dense(0.4, 1.0, Double.PositiveInfinity).toOPVector, | ||
Vectors.dense(0.2, 0.5, Double.PositiveInfinity).toOPVector | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.map(_.toOPVector)
Codecov Report
@@ Coverage Diff @@
## master #84 +/- ##
==========================================
+ Coverage 86.08% 86.17% +0.09%
==========================================
Files 292 294 +2
Lines 9490 9550 +60
Branches 316 528 +212
==========================================
+ Hits 8169 8230 +61
+ Misses 1321 1320 -1
Continue to review full report at Codecov.
|
dataset.select(col("*"), functionUDF(struct(columns: _*)).as(getOutputFeatureName, meta)) | ||
} | ||
|
||
private val transformNFn = FeatureSparkTypes.transform2N[I1, I2, O](transformFn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private lazy val
} | ||
|
||
private val transformNFn = FeatureSparkTypes.transform2N[I1, I2, O](transformFn) | ||
override def transformKeyValue: KeyValue => Any = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lazy val
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what should be lazy here. Please clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikeloh77 you dont want to know 😛 but if you are curious, mainly because of SelectedModel class in ModelSelector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
…ml/BinarySequenceEsimator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!!
It would be useful to have an Estimator/Transformer combo where it can take a single feature of type I1 and a sequence of features of type I2. This is to augment the concept of the SequenceEstimator.